Currently, a multimedia apparatus generally includes two parts: image display and sound playback. Since image display and sound playback are generally completed by different devices, the two parts generally are associated with each other only in time, but not in space. The size of common media apparatuses varies greatly, ranging from a few inches (such as a mobile phone device and a tablet computer) to tens of inches (such as a laptop, a desktop, and a television screen) and even to hundreds of inches (such as an outdoor advertising screen), and accordingly the size and distribution of corresponding sound player devices also vary greatly. Most of current video file formats fail to take spatial information of sounds into account, making it difficult for a client to accurately reproduce the sound effect.